Arabic Language Modeling with Finite State Transducers

نویسنده

  • Ilana Heintz
چکیده

In morphologically rich languages such as Arabic, the abundance of word forms resulting from increased morpheme combinations is significantly greater than for languages with fewer inflected forms (Kirchhoff et al., 2006). This exacerbates the out-of-vocabulary (OOV) problem. Test set words are more likely to be unknown, limiting the effectiveness of the model. The goal of this study is to use the regularities of Arabic inflectional morphology to reduce the OOV problem in that language. We hope that success in this task will result in a decrease in word error rate in Arabic automatic speech recognition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Diacritization Using Weighted Finite-State Transducers

Arabic is usually written without short vowels and additional diacritics, which are nevertheless important for several applications. We present a novel algorithm for restoring these symbols, using a cascade of probabilistic finitestate transducers trained on the Arabic treebank, integrating a word-based language model, a letter-based language model, and an extremely simple morphological model. ...

متن کامل

Modeling Imperative String Operations with Transducers

We present a domain-specific imperative language, Bek, that directly models low-level string manipulation code featuring boolean state, search operations, and substring substitutions. We show constructively that Bek is reversible through a semantics-preserving translation to symbolic finite state transducers, a novel representation for transducers that annotates transitions with logical formula...

متن کامل

Novel Probabilistic Finite-State Transducers for Cognate and Transliteration Modeling

We present and empirically compare a range of novel probabilistic finite-state transducer (PFST) models targeted at two major natural language string transduction tasks, transliteration selection and cognate translation selection. Evaluation is performed on 10 distinct language pair data sets, and in each case novel models consistently and substantially outperform a well-established standard re...

متن کامل

Explicit Modeling of Phonological Changes in Finite-state Transducer Based Hungarian Lvcsr

This article describes the operation and the experimental evaluation of the pronunciation modeling component of the first Hungarian large vocabulary continuous speech recognition system. The proposed method is based on the implementation of context dependent rewrite rules by weighted finite state transducers (WFSTs). The proposed phonological model decreases the error rate by 8.32% relatively c...

متن کامل

Rational Kernels for Arabic Stemming and Text Classification

In this paper, we address the problems of Arabic Text Classification and stemming using Transducers and Rational Kernels. We introduce a new stemming technique based on the use of Arabic patterns (Pattern Based Stemmer). Patterns are modelled using transducers and stemming is done without depending on any dictionary. Using transducers for stemming, documents are transformed into finite state tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008